On Evolutionary Spatial Heterogeneous 

Games 



H. Fort a 

a Instituto de Fisica, Facultad de Ciencias, Universidad de la Republica, Igud 4225, 

11400 Montevideo, Uruguay 



Abstract 

How coperation between self-interested individuals evolve is a crucial problem, both 
in biology and in social sciences, that is far from being well understood. Evolution- 
ary game theory is a useful approach to this issue. The simplest model to take into 
account the spatial dimension in evolutionary games is in terms of cellular automata 
with just a one-parameter payoff matrix. Here, the effects of spatial heterogeneities 
of the environment and/or asymmetries in the interactions among the individuals 
are analysed through different extensions of this model. Instead of using the same 
universal payoff matrix, bimatrix games in which each cell at site has its own 
different 'temptation to defect' parameter T(i,j) are considered. Firstly, the case in 
which these individual payoffs are constant in time is studied. Secondly, an evolv- 
ing evolutionary spatial game such that T=T(i, j;t), i.e. besides depending on the 
position evolves (by natural selection) , is used to explore the combination of spatial 
heterogeneity and natural selection of payoff matrices. 

Key words: Complex adaptive systems, Evolutionary Game Theory, Cellular 
Automata 



1 Introduction 



Cooperation is ubiquitous in nature [1] and essential for evolution [2]. How 
did cooperative behaviour evolve among self-interested individuals is an im- 
portant open question in biology and social sciences. A powerful tool to analyse 
this issue is evolutionary game theory [3]- [5]. It originated as an application 
of the mathematical theory of games to biological contexts, has become of 
increased interest to economists, sociologists, and anthropologists as well as 
philosophers. Of particular interest are two-player games where each player 
has a strategy space containing two actions or '2x2 games'. In the problem 
of cooperation vs. competition the two actions or strategies are: to cooperate 
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(C) or to defect (D). The payoff of a player depends on its action and the one 
of its coplayer. Assuming symmetry between the two players, i.e. that they 
are interchangable, there are four possible values for this payoff R, S, T & P, 
corresponding respectively to the four situations [C,C], [C,D], [D,C] & [D,D] 
(the first entrance correspond to the action of the player and the second to the 
action of its opponent). The paradigmatic example is the Prisoner's Dilemma 
(PD) in which the four payoff are ranked as T>R>P>S. So T is the 'temp- 
tation' to defect, R is the 'reward' for mutual cooperation P the 'punishment' 
for mutual defection and S is the 'sucker's payoff'. Clearly it pays more to 
defect (T>R and P>S) but the dilemma is that if both play D they get P 
that is worse than the reward R they would have get if they had played C. 
The PD is connected with two other social dilemma games [6], [7]: When the 
damage from mutual defection is increased so that it finally exceeds the dam- 
age suffered by being exploited, i.e. T>R>S>P, the new game is called the 
chicken [8]. This game applies thus to situations such that mutual defection 
is the worst possible outcome for both players as it happens in most of ani- 
mal contests. On the other hand, when the reward surpasses the temptation 
i.e. R>T>P>S, the game becomes the Stag Hunt (SH) [9]. There are several 
animal behaviours that have been described as stag hunts. For example, the 
coordination of slime molds. When individual amoebae of Dictyostelium dis- 
coideum are starving, they aggregate to form one large body. Here if they all 
act together they can successfully reproduce, however the success depends on 
the cooperation of many individuals. 

Classical evolutionary game theory constitutes a mean-field approximation 
which does not include the effect of spatial structures of populations. Axelrod 
[1] suggested to place the individual "players" in a two-dimensional spatial ar- 
ray playing with its neighbours. These cellular automata (CA) were explored 
by Nowak and May [10] who use a simple one-parameter payoff matrix speci- 
fying a game that is in the frontier between the PD and chicken. They found 
that the spatial structure allows cooperators to build clusters in which the 
benefits of mutual cooperation can outweight losses against defectors. This 
enables the maintenance of cooperation in contrast to the spatially unstruc- 
tured game where defection is always favoured. Different variations of this 
Nowak-May (N-M) model were studied taking into account distinct aspects, 
like the changes introduced by modifying the update rule [11], the effect of 
allowing voluntary participation [12], the dependence on the graph topology 
[13], [14], the effect of noise and connectivity structure [15], the consequences 
of the environmental stress by requiring a minimum threshold to survive [16], 
etc. 

Here my goal is to extend the N-M model in several directions with the aim 
to study the following issues. 

(1) The consequences of an heterogeneous environment. This can be modelled 
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through a payoff matrix varying from place to place. For example, in a 
more hostile region the reward for mutual cooperation can surpass the 
temptation to defect leading thus to switching from the PD to the SH, 
etc. 

(2) The effect of asymmetries in the interactions between the agents. Indeed, 
asymmetries in the costs and benefits of cooperating have been mentioned 
as an important ingredient in the evolution of cooperation [17]. 

(3) The emergence of payoff matrices from the very natural selection process. 
The determination of the ranking of the payoff values to explain the 
results of experiments or field observations is a far from trivial matter. 
For instance, in the case of animals there is controversy whether the PD 
or chicken is the appropriate game [18,19]. Or many circumstances that 
have been described as PD might also be interpreted as a SH, depending 
on how fitness is calculated [9]. Moreover, experimental studies indicate 
that the payoff matrix is not a constant for very simple individuals like 
viruses [20]. 

Hence, the basic idea is, instead of using the same universal payoff matrix, to 
use heterogeneous bimatrix games [5], i.e. each player has it own particular 
payoff matrix. I begin studying a first variant of this model, in which the cell 
payoff matrix is regarded as something completely external to the individ- 
ual, corresponding to environmental properties. These properties change from 
place to place but not with time (at least not in the temporal scale of the 
organisms) and thus the matrix diversity is kept fixed during the evolution. 

Then I focus on a second variant in which the cell payoff matrix is assumed to 
reflect 'phenotypic' properties, internal to organisms, thus it evolves together 
with its strategy. 



2 The Model 



The N-M model consists in a cellular automaton in which cells represent the 
simplest possible agents: unconditional players versus its neighbours. That is, 
at each generation or time step t of the game, there are those who always 
play C and those who always play D. Different types of neighbourhoods can 
be considered. For example the von Neumann neighbourhood with z=4 neigh- 
bour cells (above, below, left and right cells). The score U of a given player 
is the sum of the payoffs it collects against its z neighbours. The payoff ma- 
trix depends on just one parameter T while the other three have fixed values: 
R=l, S—P—0. The dynamic is synchronous: all the agents update their states 
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simultaneously at the end of each lattice sweep Natural selection is mim- 
icked through the simplest 'Imitate the Best' (in the neighbourhood) update 
rule [21] : Each individual, after playing against its neighbours, adopts the 
strategy of the most successful neighbour or msn (the one that collected the 
highest utilities U in the neighbourhood at this round). 

In this work two variants of the N-M model are studied: 

• Heterogeneous non evolving temptation. 

As a first step let's consider the version with the temptation parameter T 
varying from cell to cell as a uniform random variable fixed in time: T(i,j), 
where i and j are the coordinates of the center of the cell. Since P=S, when 
T(i,j) >1 the payoff matrix of the cell at (i,j) is in the frontier between 
the PD and the chicken, let's denote this game by PDi. On the other hand, 
when T(i,j) <1 the game played by the cell is in the frontier between the 
PD and the SH and let's denote it by PD2. 

• Heterogeneous evolving temptation. 

Next I concentrate on the case in which the temptation , besides varying 
from cell to cell, evolves with time i.e. we have a variable T(i, j; t) depending 
on the cell coordinates as well as the time step or generation t. It starts, at 
t=0, as a uniform random variable and then evolves by natural selection: 
at each generation t all the members of a neighbourhood adopt the payoff 
matrix of the msn. 

Different intervals [T m i n ,T max ] for T(i,j; 0) are explored. T max is taken in [1,2.5] 
in steps of 0.1, and for each of these values two values of T min are considered: 

and 1. Since T max > 1, the possible games are determined by T min : when 
T m i n =l the payoff matrix of every cell corresponds to PD^ On the other hand, 
the case T m j„=0 is less restrictive. In some cells the game can be PDi and in 
others PD2. 

Square lattices of sizes ranging from 100 x 100 to 500 x 500, with periodic 
boundary conditions are used. 

The initial configuration for strategies is half of the cells, chosen at random, 
playing C and the other half playing D. The temptation T(i,j; 0) varies from 
cell to cell as a uniform random variable in the interval [T m i n ,T max \. 

For each generation t the average fraction of cooperators (c) and, in the sec- 
ond variant, the average temptation (T) are computed until the steady state 

1 It is known that synchronous actualisation of the state of a lattice can induce 
artificial effects. I studied then what happened when performing an asynchronous 
update i.e. each site is updated after comparing with their neighbourhood. I found 
qualitatively the same results. 
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is reached (typically, this takes between 500 to 1000 generations). The sym- 
bols (.) denote averages that are both spatial, over all the lattice cells, and 
over 500 runs each starting from a different initial configuration (to ensure 
independence of the initial conditions). 



3 Results 



3.1 Heterogeneous non evolving temptation 



Both the z=A von Neumann neighbourhood and the z=8 Moore neighbourhood 
were studied producing qualitatively the same results. Therefore, all the results 
presented in this work are for z—A. 

Fig. 1 shows the asymptotic values of (c) vs. T max . To allow comparison with 
the N-M model we also plot its results (in this case the temptation is the 
same universal parameter T for all the cells i.e. T max = T min = T). Roughly 
the z=4 N-M CA exhibits four different regions in the T parameter axis, each 
corresponding to a given different asymptotic value of (c): 1< T <4/3, 4/3< 
T <3/2, 3/2< T <2 and T >2. Note that the heterogeneous temptation model 
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Fig. 1. Asymptotic value of (c) vs. T max for z=4: Nowak-May model (*), hetero- 
geneous non evolving temptation model with T m i n = 1 (filled circles) and T min =0 
(filled squares). Symbols are bigger than error bars. 

yields higher values for the asymptotic fraction of cooperators. For T min =l it 
coincides with the N-M model until T max =4:/3, reproducing its higher step, but 
then it detaches from it. In the case of T min =0 the differences are more drastic: 
no steps are observed, and even for T max =2.5 there's cooperation ((c) ~ 1/3). 
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In fact (c), as expected, is always greater for T min =0 since, on average, the 
temptation to defect is smaller. For example, for T max =2, when T min changes 
from 1 to 0, the fraction of cooperators jumps from (c) ~ 1/2 to (c) ~ 2/3. 



3.2 Evolving heterogeneous temptation 



Fig. 2 includes the data of Fig. 1 plus the corresponding asymptotic (c) pro- 
duced by the evolving temptation variant, for the same values of T max and 
T m in- Note that this version yields higher values for the asymptotic fraction of 
cooperators. For T m j„=l results are even more similar to the ones of the N-M 
model, there is complete coincidence up to T max =3/2. When higher values of 
T are allowed, something new occurs: the behaviour becomes non-monotonic, 
there's also a step but now higher than the one to the left. This is quite unex- 
pected, because when larger 'temptation' values {T(i, j;0)} are allowed, the 
fraction of cooperators becomes also larger. For T max > 2 the steps behaviour 
disappears. In the case of T min =0 the points interpolate between the ones of 
the corresponding heterogeneous non evolving variant and the N-M. In fact, 
they coincide with the points for T min =l over the step [T max —4/3, T max =3/2] 
and also show a non-monotonic variation for T max >3/2. 
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Fig. 2. The same as Fig. 1 plus the asymptotic value of (c) vs. T max for the het- 
erogeneous evolving temptation model with T m j n =l (non-filled circles) Sz T m j n =0 
(non-filled squares). Symbols are bigger than error bars. 



In Fig. 3 the evolution of the average temptation (T) and fraction of coop- 
erators (c) for T max =l and both for T min =0 and =1 is shown. As expected, 
the value of (T) is lower for T min =0 than for T min =l. Notice the sort of 
'specular symmetry' between the curves of (T)(t) and (c)(t) when they evolve 
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from (T)o=(2-0)/2=l and (co)=0.5 , respectively, to their steady state values. 
There is a short transient in which (T) ((c)) grows (drops) very quickly and 
then it decreases (increases) until it reaches its asymptotic value. These op- 
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Fig. 3. (T) (below) and (c) (above) vs. the generation t (in both T max =2) for: 
T m in=l (o) and T m i n =0 (□). Symbols are bigger than error bars. 

posite behaviours are quite surprising since an increase of the average payoff 
for cheating is accompanied by an increase of the average cooperation level 
(just the opposite to what happens in the case of non evolving temptation). 
Besides the average of the temptation (T), let's analyse also the evolution of 
its distribution. Fig. 4 shows the asymptotic (after 200 time steps) distribu- 
tion of values of the temptation T for T max =2 and for T min =l (A) and T min =0 
(B), for a single initial condition. Hence, in both cases there is evolution from 
a uniform distribution to a non uniform one. The mean value for the temp- 
tation produced by these histograms are, respectively, 1.615 and 1.42, both 
values in good agreement with the asymptotes for (T) in Fig. 3. In the case of 
Tmin— a gaussian envelope can be completely discarded (gray dashed curve). 
For T min =l the situation is not so clear. 

The spatial patterns that emerge are illuminating. For instance, let's analyse 
what happens for T max =2 and T min =0 for an arbitrary choice of initial condi- 
tions. Fig. 5- (A) represents a typical steady state map showing clusters of C 
agents (white) on a 'sea' of D agents (black) for a lattice of 100x100=10,000 
sites. In Fig. 5-(B), all the agents that have a temptation T <1=R are marked 
in grey. Notice that they are a subset of the C agents. In other words, all the 
agents that were selected with low values of T are cooperators (the reciprocal 
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Fig. 4. Histograms for the asymptotic values of T for T max =2: (A) T min =l, (B) 
T m i n =0. In gray: gaussian distributions for the same values of mean (T) and stan- 
dard deviation ax- 
is not true: many C agents have values of T > R=l). An explanation for this 
is that at the end only "successful" players remain i.e. players whose strategies 
and phenotypes (temptation T) were copied by its neighbours. A player with a 
small T clearly cannot be successful playing D so it must have been playing C. 
As a result of selection, the system evolves from an initial configuration with 
L x L different payoff matrices (one per lattice cell) to a situation in which 
many less matrices coexist. Starting in this case with 100x100=10,000 payoff 
matrices one ends typically with around 500. This is shown in Fig. 6 which is 
the corresponding map of asymptotic values of T(i,j). In this particular case 
it consists in 548 'patches' of different tones of gray, each corresponding to a 
selected value of T. One can recognize several of the domains that appear in 
Fig. 5-(B). This is natural since the strategy (C or D) of the msn is copied by 
agents together with its temptation. Also, it helps to understand why there 
are no clusters with white and gray sites well mixed in Fig. 5-(B): the typical 
length of the T-patches is much larger than the lattice spacing. 

The average individual score (U) can be approximated by 

U ~ 0.5(c) 2 + (T)(c)(l — (c)). (1) 

Substituting in equation (1) the computed values of (T) and (c), one get a 
value that slightly overestimates (U). For example, for z=4: U ~ 0.47 & (U) ~ 
0.43. 
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20 40 GO 80 100 

Fig. 5. Steady state map of strateg ies T raax — 2 and T m i n — 0. (A) C agents (white) 
and D agents (black). (B) Agents with T > 1 = R: C (white), D (black); agents 
with T < 1 = R in grey. 

4 Discussion and Final Comments 



Two variants implementing heterogeneous extensions of the N-M model were 
introduced and studied. These model versions can be regarded as two extreme 
situations: On the one hand, in the case of 'temptations' T(i,j) constant in 
time, we have the external environment point of view. That is, the payoff 
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matrix is regarded as something completely external to the individuals corre- 
sponding to the properties of physical background in which they are located. 
These properties change from place to place but not with time (at least not in 
the temporal scale of the organisms) preserving thus the original complete di- 
versity of payoffs. For example, in sites where the environment is more hostile 
the reward for mutual cooperation may become better than the temptation 
leading to switching from the PD to the SH, etc. On the other hand, when 
the temptation evolves, we have the 'phenotypic' point of view. In this case, 
one assumes that the payoff matrix reflects properties of an organism, i.e. is 
part of its phenotype, so it experiences natural selection. Therefore the model 
can cope with diversity and asymmetric interactions between 'players' either if 
they are product of an heterogeneous environment or of different 'phenotypes' 
characterized by different temptations T. That is, for multiple causes, it is 
possible that the payoffs for you and your opponent are not equal (indeed this 
is what happens in general in real life). Furthermore, as it was mentioned, an 
empirical determination of the payoffs can be very difficult while variations in 
the payoff values can dramatically alter theoretical predictions. Here, one of 
my goals was to minimize the dependence of crucial model predictions, like the 
evolution of cooperation, on model parameters. So I have chosen the simplest 
agents -unconditional players (without memory or strategic parameters)- and I 
have not introduced a universal temptation parameter T for all the players. In- 
stead, starting with random heterogeneous distributions of T, I let that steady 
states with several domains characterized by different values of T emerge from 
the very process of natural selection. All the results are quite robust and do 
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not depend on particular payoffs choices, nor on the lattice topology or if the 
update is synchronous or asynchronous, and do not rely on specific character- 
istics of the agents. The dependence on the initial conditions is also mostly 
removed by taking averages. Thus this really minimal model seems to combine 
robustness with realism and simplicity. 

Both model variants yield a higher value of the asymptotic fraction of coop- 
erators (c) than the one produced by the uniform N-M model. It is worth 
remarking that for this heterogeneous payoff matrix model, in general, (c) at- 
tains higher values for the the case of constant payoff matrices than for the 
evolving ones. Also, for both model variants, in general, a diminution of T min 
(from 1 to 0) promotes a higher level of cooperation. This seems natural since 
less competitive games (limit cases of SH) are allowed. 

The behaviour of (c) as a function of T max = is approximately monotonic 
for constant payoff matrices while in the case of evolving payoff matrices it 
exhibits non-monotonic variations. So, in this second variant, an interesting 
(unexpected) prediction is that by allowing larger initial 'temptation' payoffs 
{T(i,j; 0)} the number of cooperators grows. 

Concerning the emergence of spatial patterns, the evolving temptation ver- 
sion organises into a steady state that exhibits a rich structure with several 
'patches' of agents using the same payoff matrix. In addition, all the agents 
that reach the steady state with values of T < R=l (when T min =0) are coop- 
erators. 

In this work the evolution is driven just by natural selection. The effect of 
incorporating mutations, the other driving force of evolution, is something 
important to explore. Work is in progress on that direction. Nevertheless, 
preliminary results show that the main conclusions presented here remain 
qualitatively the same. It is also known that modifications of the update rule, 
and, in particular, an asynchronous dynamics may produce important changes 
[11], [14], [21]. The effects of changing the update rule is something that deserves 
to be analysed separately in another work. 
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